Effective ranking with arbitrary passages
نویسندگان
چکیده
Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material amongst otherwise irrelevant text. In this paper, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents.
منابع مشابه
TREC 7 Ad Hoc, Speech, and Interactive tracks at MDS/CSIRO
1 Overview For the 1998 round of TREC, the MDS group, long-term participants at the conference, jointly participated with newcomers CSIRO. Together we completed runs in three tracks: ad-hoc, interactive, and speech. 2 Ad-hoc task In TREC-5 we used document retrieval based on arbitrary passages 8, 9], or xed-length passages that could start at any word position. Although far from the best runs i...
متن کاملA Phased Ranking Model for Information Systems
To effectively sort and present relevant information pieces (e.g., answers, passages, documents) to human users, information systems rely on ranking models. Existing ranking models are typically designed for a specific task and therefore are not effective for complex information systems that require component changes or domain adaptations. For example, in the final stage of question answering, ...
متن کاملEvidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering
A popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which...
متن کاملBoosting weak ranking functions to enhance passage retrieval for Question Answering
We investigate the problem of passage retrieval for Question Answering (QA) systems. We adopt a machine learning approach and apply to QA a boosting algorithm initially proposed for ranking a set of objects by combining baseline ranking functions. The system operates in two steps. For a given question, it first retrieves passages using a conventional search engine and assigns each passage a ser...
متن کاملTrec 7 Ad Hoc, Speech, and Interactive Tracks at Mdsscsiro 2.1 System Description
In TREC-5 we used document retrieval based on arbitrary passages [8, 9], or xed-length passages that could start at any word position. Although far from the best runs in TREC5, these results were promising, in particular for long documents. In TREC-6 we continued with arbitrary passages, but our main emphasis was on comprehensive factor analysis of successful automatic query expansion and re ne...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 52 شماره
صفحات -
تاریخ انتشار 2001